{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 批量预测"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在进行模型评估或其他任务时，通常需要对大量数据进行预测。然而，模型推理过程往往耗时较长，通过循环串行执行会增加整体时间成本，而并行执行则需要额外的开发工作。\n",
    "\n",
    "SDK 提供了多种解决方案来应对这一场景，其中包括：\n",
    "\n",
    "- [本地并行推理](https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/batch_prediction.ipynb)：利用 SDK 内置的批量推理功能，在本地通过并行调用模型接口实现高效的批量预测。\n",
    "- [数据集评估](https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/dataset/batch_inference_using_dataset.ipynb)：利用 SDK 的 Dataset 模块，调用平台提供的数据集评估功能，以便快速而有效地完成任务。\n",
    "- [离线批量推理](https://github.com/baidubce/bce-qianfan-sdk/blob/main/cookbook/offline_batch_inference.ipynb)：对于时间要求不那么严格的场景，可以考虑利用平台提供的离线批量预测能力，以降低实时推理的负载压力。\n",
    "\n",
    "本文将介绍第一种解决方案，即本地并行推理。\n",
    "\n",
    "注意：本文的方法需要 SDK 版本 >= 0.1.4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:38.489998Z",
     "start_time": "2024-02-21T08:12:38.258711Z"
    }
   },
   "outputs": [],
   "source": [
    "import qianfan\n",
    "import os\n",
    "\n",
    "# 这里请根据 SDK 文档获取自己的 access key 和 secret key\n",
    "os.environ[\"QIANFAN_ACCESS_KEY\"] = \"your access key\"\n",
    "os.environ[\"QIANFAN_SECRET_KEY\"] = \"your secret key\"\n",
    "# 由于并行会带来较大的并发量，容易触发 QPS 限制导致请求失败\n",
    "# 因此建议这里为 SDK 设置一个合理的 QPS 限制，SDK 会根据该数值进行流控\n",
    "os.environ[\"QIANFAN_QPS_LIMIT\"] = \"3\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 准备数据"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "需要准备的数据与需要使用的模型类型相关，例如 `Completion` 接受的参数 `prompt` 类型是 `str`，那么批量预测时需要使用的就是 `List[str]`。\n",
    "\n",
    "这里以 CMMLU 数据集为例，使用 `Completion` 模型进行批量预测。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:41.478254Z",
     "start_time": "2024-02-21T08:12:38.490827Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: datasets in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (2.16.1)\r\n",
      "Requirement already satisfied: aiohttp in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (3.9.3)\r\n",
      "Requirement already satisfied: numpy>=1.17 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (1.23.4)\r\n",
      "Requirement already satisfied: pyarrow>=8.0.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (15.0.0)\r\n",
      "Requirement already satisfied: pyyaml>=5.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (6.0.1)\r\n",
      "Requirement already satisfied: filelock in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (3.6.0)\r\n",
      "Requirement already satisfied: huggingface-hub>=0.19.4 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (0.20.3)\r\n",
      "Requirement already satisfied: xxhash in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (3.4.1)\r\n",
      "Requirement already satisfied: pyarrow-hotfix in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (0.6)\r\n",
      "Requirement already satisfied: pandas in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (1.4.4)\r\n",
      "Requirement already satisfied: packaging in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (23.2)\r\n",
      "Requirement already satisfied: requests>=2.19.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (2.31.0)\r\n",
      "Requirement already satisfied: dill<0.3.8,>=0.3.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (0.3.7)\r\n",
      "Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (2023.10.0)\r\n",
      "Requirement already satisfied: multiprocess in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (0.70.15)\r\n",
      "Requirement already satisfied: tqdm>=4.62.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from datasets) (4.66.2)\r\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (1.9.4)\r\n",
      "Requirement already satisfied: aiosignal>=1.1.2 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (1.3.1)\r\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (6.0.4)\r\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (1.4.1)\r\n",
      "Requirement already satisfied: attrs>=17.3.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (23.2.0)\r\n",
      "Requirement already satisfied: async-timeout<5.0,>=4.0 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from aiohttp->datasets) (4.0.3)\r\n",
      "Requirement already satisfied: typing-extensions>=3.7.4.3 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from huggingface-hub>=0.19.4->datasets) (4.9.0)\r\n",
      "Requirement already satisfied: idna<4,>=2.5 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (3.6)\r\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (3.3.2)\r\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (2023.11.17)\r\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from requests>=2.19.0->datasets) (1.26.18)\r\n",
      "Requirement already satisfied: python-dateutil>=2.8.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from pandas->datasets) (2.8.2)\r\n",
      "Requirement already satisfied: pytz>=2020.1 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from pandas->datasets) (2022.1)\r\n",
      "Requirement already satisfied: six>=1.5 in /Users/guoweiming/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0)\r\n"
     ]
    }
   ],
   "source": [
    "# 这里使用 HuggingFace 上的数据集，如果已经有数据集可以跳过这步\n",
    "!pip install datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:51.710332Z",
     "start_time": "2024-02-21T08:12:41.478987Z"
    }
   },
   "outputs": [],
   "source": [
    "import datasets\n",
    "dataset = datasets.load_dataset(\"haonan-li/cmmlu\", 'chinese_literature')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:51.739693Z",
     "start_time": "2024-02-21T08:12:51.711565Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "《日出》的结构采用的是\n",
      "以眉间尺为父报仇作为中心线索的小说是\n",
      "“他只是遗憾／他的祖先没有像他一样想过／不然，见到大海的该是他了”。以此作为结尾的诗歌是\n",
      "下列含有小说《伤逝》的鲁迅作品集是\n",
      "汪静之属于\n",
      "杂文《春末闲谈》中，细腰蜂的故事所要表达的是\n",
      "臧克家的第一部诗集是\n",
      "下列小说中运用了“戏剧穿插法”的是\n",
      "台静农的《记波外翁》中波外翁的性格是\n",
      "下列哪一项不属于巴金的短篇小说集\n"
     ]
    }
   ],
   "source": [
    "# 这里仅取前 10 个样本进行测试\n",
    "# data 只需要是 List[str] 类型即可\n",
    "data = dataset['test']['Question'][:10]\n",
    "for i in data:\n",
    "    print(i)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 批量预测"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以将需要预测的所有数据放在一个 List 中，然后调用 `batch_do` 方法，SDK 会在后台启动数个线程并行处理，线程数量由 `worker` 参数指定。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:51.740119Z",
     "start_time": "2024-02-21T08:12:51.718514Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:12:51] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:51] openapi_requestor.py:244 [t:13147648000]: requesting llm api endpoint: /chat/eb-instant\n"
     ]
    }
   ],
   "source": [
    "r = qianfan.Completion().batch_do(data, worker_num=5)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回的结果 `r` 是一个 future 对象，可以调用 `r.result()` 等待所有数据预测完成，并返回结果。\n",
    "\n",
    "或者也可以直接遍历 r，逐个获取结果，在遇到未完成的任务时再等待。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:57.910669Z",
     "start_time": "2024-02-21T08:12:51.728943Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:12:51] openapi_requestor.py:244 [t:13198016512]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:51] openapi_requestor.py:244 [t:13231595520]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:51] openapi_requestor.py:244 [t:13214806016]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:51] utils.py:126 [t:13181227008]: no event loop in thread `ThreadPoolExecutor-2_1`, async feature won't be available. Please make sure the object is initialized in the thread with event loop.\n",
      "[INFO] [02-21 16:12:51] utils.py:126 [t:13147648000]: no event loop in thread `ThreadPoolExecutor-2_0`, async feature won't be available. Please make sure the object is initialized in the thread with event loop.\n",
      "[INFO] [02-21 16:12:51] utils.py:126 [t:13198016512]: no event loop in thread `ThreadPoolExecutor-2_2`, async feature won't be available. Please make sure the object is initialized in the thread with event loop.\n",
      "[INFO] [02-21 16:12:51] utils.py:126 [t:13231595520]: no event loop in thread `ThreadPoolExecutor-2_4`, async feature won't be available. Please make sure the object is initialized in the thread with event loop.\n",
      "[INFO] [02-21 16:12:51] utils.py:126 [t:13214806016]: no event loop in thread `ThreadPoolExecutor-2_3`, async feature won't be available. Please make sure the object is initialized in the thread with event loop.\n",
      "[INFO] [02-21 16:12:51] oauth.py:207 [t:13181227008]: trying to refresh access_token for ak `rRlk1M***`\n",
      "[INFO] [02-21 16:12:51] oauth.py:207 [t:13147648000]: trying to refresh access_token for ak `rRlk1M***`\n",
      "[INFO] [02-21 16:12:51] oauth.py:207 [t:13198016512]: trying to refresh access_token for ak `rRlk1M***`\n",
      "[INFO] [02-21 16:12:51] oauth.py:207 [t:13231595520]: trying to refresh access_token for ak `rRlk1M***`\n",
      "[INFO] [02-21 16:12:51] oauth.py:207 [t:13214806016]: trying to refresh access_token for ak `rRlk1M***`\n",
      "[INFO] [02-21 16:12:51] oauth.py:220 [t:13181227008]: sucessfully refresh access_token\n",
      "[INFO] [02-21 16:12:51] oauth.py:220 [t:13231595520]: sucessfully refresh access_token\n",
      "[INFO] [02-21 16:12:51] oauth.py:220 [t:13214806016]: sucessfully refresh access_token\n",
      "[INFO] [02-21 16:12:51] oauth.py:220 [t:13198016512]: sucessfully refresh access_token\n",
      "[INFO] [02-21 16:12:51] oauth.py:220 [t:13147648000]: sucessfully refresh access_token\n",
      "[INFO] [02-21 16:12:52] openapi_requestor.py:244 [t:13214806016]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:53] openapi_requestor.py:244 [t:13231595520]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:54] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:55] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:55] openapi_requestor.py:244 [t:13214806016]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:57] base.py:89 [t:13214806016]: All tasks finished, exeutor will be shutdown\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "问题0：\n",
      "《日出》的结构采用的是\n",
      "《日出》是曹禺的作品，它的结构采用的是戏剧结构。这部剧作通过一个普通小旅馆中的一群人物的群戏，展示了现实社会中压迫者的凶残、被压迫者的善良，并以其强大的社会纵深感揭示了社会矛盾的复杂性。\n",
      "\n",
      "曹禺的戏剧结构注重情节的紧凑和人物关系的构建，通过精心设计的情节和人物关系，将各种矛盾冲突推向高潮，从而达到揭示社会矛盾和人性冲突的目的。同时，曹禺也善于运用象征、暗示、隐喻等手法，增强戏剧的视觉化和艺术性。\n",
      "\n",
      "《日出》的结构遵循了戏剧的结构原则，通过一系列的事件和人物活动，逐步展现出复杂的社会矛盾和人性的挣扎。因此，可以认为《日出》的结构采用的是戏剧结构。\n",
      "=================\n",
      "问题1：\n",
      "以眉间尺为父报仇作为中心线索的小说是\n",
      "以眉间尺为父报仇作为中心线索的小说有：\n",
      "\n",
      "1. 《眉间尺》是孙见喜所著的历史小说，讲述春秋时期吴越争霸时期，忠臣伍子胥报仇雪恨的故事。\n",
      "2. 《血仇》是刘流所著的小说，讲述了眉间尺为父报仇的故事。\n",
      "\n",
      "此外，还有《中国剑侠传奇：眉间尺》等。如果您需要更多信息，可以到相关网站上查询。\n",
      "=================\n",
      "问题2：\n",
      "“他只是遗憾／他的祖先没有像他一样想过／不然，见到大海的该是他了”。以此作为结尾的诗歌是\n",
      "根据您提供的诗歌，我理解这首诗可能反映了一种失落或者对于自己生活和身份的不满情绪。对于海洋的渴望可能是渴望更大的冒险，或者探索新的可能性。这样的情绪可以被描述为对过去的某种“遗憾”，也许是过去的人或事物没有带来现在的“他”渴望的视野和可能性。至于最后一句，“不然，见到大海的该是他了”，这句暗示了某种改变或进步的可能性，也许是一种对未来的期待或希望。\n",
      "\n",
      "这首诗的结尾部分，并没有明确表达出“他”的具体身份和背景，所以我无法确定这首诗的具体主题或类型。但是，从诗歌的情感和语境来看，它可能是一首反映个人内心世界，对生活和身份不满的作品。\n",
      "\n",
      "不过这只是一个初步的分析，可能并不能完全准确解读这首诗的所有内涵。每个诗人或作品都有其独特的特点和背景，可能需要对特定的环境和语境进行更深入的研究和理解。希望这个答案能帮到您。\n",
      "=================\n",
      "问题0：\n",
      "《日出》的结构采用的是\n",
      "《日出》是曹禺的作品，它的结构采用的是戏剧结构。这部剧作通过一个普通小旅馆中的一群人物的群戏，展示了现实社会中压迫者的凶残、被压迫者的善良，并以其强大的社会纵深感揭示了社会矛盾的复杂性。\n",
      "\n",
      "曹禺的戏剧结构注重情节的紧凑和人物关系的构建，通过精心设计的情节和人物关系，将各种矛盾冲突推向高潮，从而达到揭示社会矛盾和人性冲突的目的。同时，曹禺也善于运用象征、暗示、隐喻等手法，增强戏剧的视觉化和艺术性。\n",
      "\n",
      "《日出》的结构遵循了戏剧的结构原则，通过一系列的事件和人物活动，逐步展现出复杂的社会矛盾和人性的挣扎。因此，可以认为《日出》的结构采用的是戏剧结构。\n",
      "=================\n",
      "问题1：\n",
      "以眉间尺为父报仇作为中心线索的小说是\n",
      "以眉间尺为父报仇作为中心线索的小说有：\n",
      "\n",
      "1. 《眉间尺》是孙见喜所著的历史小说，讲述春秋时期吴越争霸时期，忠臣伍子胥报仇雪恨的故事。\n",
      "2. 《血仇》是刘流所著的小说，讲述了眉间尺为父报仇的故事。\n",
      "\n",
      "此外，还有《中国剑侠传奇：眉间尺》等。如果您需要更多信息，可以到相关网站上查询。\n",
      "=================\n",
      "问题2：\n",
      "“他只是遗憾／他的祖先没有像他一样想过／不然，见到大海的该是他了”。以此作为结尾的诗歌是\n",
      "根据您提供的诗歌，我理解这首诗可能反映了一种失落或者对于自己生活和身份的不满情绪。对于海洋的渴望可能是渴望更大的冒险，或者探索新的可能性。这样的情绪可以被描述为对过去的某种“遗憾”，也许是过去的人或事物没有带来现在的“他”渴望的视野和可能性。至于最后一句，“不然，见到大海的该是他了”，这句暗示了某种改变或进步的可能性，也许是一种对未来的期待或希望。\n",
      "\n",
      "这首诗的结尾部分，并没有明确表达出“他”的具体身份和背景，所以我无法确定这首诗的具体主题或类型。但是，从诗歌的情感和语境来看，它可能是一首反映个人内心世界，对生活和身份不满的作品。\n",
      "\n",
      "不过这只是一个初步的分析，可能并不能完全准确解读这首诗的所有内涵。每个诗人或作品都有其独特的特点和背景，可能需要对特定的环境和语境进行更深入的研究和理解。希望这个答案能帮到您。\n",
      "=================\n",
      "问题3：\n",
      "下列含有小说《伤逝》的鲁迅作品集是\n",
      "包含小说《伤逝》的鲁迅作品集是**《呐喊》**。\n",
      "=================\n",
      "问题4：\n",
      "汪静之属于\n",
      "汪静之属于无产阶级革命家、诗人。\n",
      "\n",
      "汪静之，湖南宁乡人，早年就读于长沙高等师范学校。他是五四新文化运动中最早的诗界革命的倡导者之一，是现代文学史上第一个以新诗发表自己爱情宣言的诗人。他与鲁迅、周作人、胡适等都有过交往，出版了《蕙的风》《选诗余集》等作品。\n",
      "=================\n",
      "问题5：\n",
      "杂文《春末闲谈》中，细腰蜂的故事所要表达的是\n",
      "《春末闲谈》中的细腰蜂的故事主要是用来表达作者对于辛亥革命以后社会混乱、弊端重重的状态的不满，以及对于劳动人民的同情。\n",
      "\n",
      "文章通过对细腰蜂执着而顽强的求偶态的生动描写，无情鞭挞了辛亥革命后遗少们的精神面貌，深刻指出了他们的愚蠢、悲惨的结局，深化了文章的主题。 革命是需要进行到底的，中途换马或立宪劝善的结论只能是培养一批批披上神圣帷幔的“细腰蜂”。所以作者满腔热情地挥笔疾书，用锋利的笔触去勾勒和揭露这些“细腰蜂”丑恶形象。\n",
      "\n",
      "以上内容仅供参考，建议阅读原文以获取更全面和准确的信息。\n",
      "=================\n",
      "问题6：\n",
      "臧克家的第一部诗集是\n",
      "臧克家的第一部诗集是《烙印》。\n",
      "\n",
      "《烙印》是现代诗人臧克家于1932年3月出版的诗集，收录了诗人早期创作的近三十首新诗。这些诗作反映了社会动荡、人民苦难的时代背景，抒发了诗人对下层劳动人民的深切同情。其中最著名的《难民》一诗，在艺术上继承了古典诗歌的意境，语言朴素自然，风格上呈现出清新、悲悯的格调。该诗集对后世的青年诗人产生了很大的影响。此外，《烙印》的出版，得到了胡适、茅盾等人的高度评价，并给了臧克家极大的鼓舞。臧克家曾写道：“当时自己年轻浮躁，动笔就写东西，遇到苦难和不幸的人事就描写他们身上的‘苦难’，唯有写《烙印》时，才感到自己是在写作，在艺术上有了明确的追求目标。”。\n",
      "=================\n",
      "问题7：\n",
      "下列小说中运用了“戏剧穿插法”的是\n",
      "使用“戏剧穿插法”的小说是《高玉宝》\n",
      "\n",
      "高玉宝，以其独特的文学创作方法创作的小说《高玉宝》，采取的是小说穿插、儿童天真无邪的笔调，把一个普通农村少年的成长经历写得曲折离奇，跌宕起伏，引人入胜。这种写法，被称之“戏剧穿插法”。\n",
      "=================\n",
      "问题8：\n",
      "台静农的《记波外翁》中波外翁的性格是\n",
      "在台静农《记波外翁》中，波外翁的性格是：\n",
      "\n",
      "* 善良。他乐于助人，对待生活充满热情。\n",
      "* 随和。他从不与人计较，脾气特别好。\n",
      "* 传统。他对传统事物有深深的热爱和尊重。\n",
      "* 深沉。他有时会表现出孤独和落寞的一面。\n",
      "\n",
      "这些性格特点共同构成了波外翁丰富、立体的形象。\n",
      "=================\n",
      "问题9：\n",
      "下列哪一项不属于巴金的短篇小说集\n",
      "下列不属于巴金的短篇小说集的是：\n",
      "\n",
      "A. 《雾》\n",
      "B. 《家》\n",
      "C. 《寒夜》\n",
      "D. 《复仇》\n",
      "正确答案是：B. 《家》。\n",
      "本题考查文化常识，主要涉及文学名著的常识。《家》是巴金的小说代表作之一，其他三项都是巴金的短篇小说集，不属于其短篇小说集。故选B。\n",
      "=================\n"
     ]
    }
   ],
   "source": [
    "results = r.results()\n",
    "# 这里为了展示结果，仅取了前 3 个结果\n",
    "# 实际使用时可以去掉该限制\n",
    "for i, (input, output) in enumerate(zip(data[:3], results)):\n",
    "    print(f\"问题{i}：\")\n",
    "    print(input)\n",
    "    print(output['result'])\n",
    "    print(\"=================\")\n",
    "\n",
    "# 或者\n",
    "for i, (input, output) in enumerate(zip(data, r)):\n",
    "    print(f\"问题{i}：\")\n",
    "    print(input)\n",
    "    print(output.result()['result'])\n",
    "    print(\"=================\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`r.results()` 会同时返回所有结果，当数据量较大时可能会占用大量内存，可以调用 `r.wait()` 仅等待而不返回结果。\n",
    "\n",
    "此外可以通过 `r.finished_count()` 获取已完成的任务数量，`r.task_count()` 获取任务总数量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:57.911503Z",
     "start_time": "2024-02-21T08:12:57.884302Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10/10\n"
     ]
    }
   ],
   "source": [
    "# 获取完成的比例\n",
    "print(\"{}/{}\".format(r.finished_count(), r.task_count()))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于其他类型模型，数据 `List` 中的元素类型需要进行改变，例如 `ChatCompletion` 接受的参数是 Messages List，那么就要进行转换。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:12:57.911866Z",
     "start_time": "2024-02-21T08:12:57.887726Z"
    }
   },
   "outputs": [],
   "source": [
    "chat_data = [[{\n",
    "    \"role\": \"user\",\n",
    "    \"content\": question\n",
    "}] for question in data]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.125006Z",
     "start_time": "2024-02-21T08:12:57.892262Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:12:57] openapi_requestor.py:244 [t:13147648000]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:57] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:57] openapi_requestor.py:244 [t:13214806016]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:57] openapi_requestor.py:244 [t:13198016512]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:57] openapi_requestor.py:244 [t:13231595520]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:58] openapi_requestor.py:244 [t:13214806016]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:12:59] openapi_requestor.py:244 [t:13231595520]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:00] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:00] openapi_requestor.py:244 [t:13198016512]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:01] openapi_requestor.py:244 [t:13198016512]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] base.py:89 [t:13231595520]: All tasks finished, exeutor will be shutdown\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "《日出》的结构采用的是戏剧中最常见的一种“三一律”，即整个戏剧必须在3小时内演完，并集中在同一场景，通过一件事件来展开情节。该剧分为3幕，每一幕又可以划分为3个小节，总共9个场次。这种结构形式使得《日出》在有限的场景和时间内，紧凑地展现出人物的性格和命运，同时也为观众提供了强烈的节奏感。\n",
      "\n",
      "此外，《日出》的结构还采用了“间离效果”的手法，即通过运用象征、荒诞和幽默等元素，使观众在观看的过程中产生出一种疏离感，从而引发对现实世界的反思。这种手法打破了传统戏剧的叙事方式，使得剧情更加引人深思。\n",
      "\n",
      "因此，《日出》的结构采用了“三一律”的戏剧结构形式，紧凑地展现了人物的性格和命运，同时也采用了“间离效果”的手法，使得剧情更加引人深思。\n"
     ]
    }
   ],
   "source": [
    "r = qianfan.ChatCompletion().batch_do(chat_data, worker_num=5)\n",
    "r.wait()\n",
    "print(r.results()[0]['result'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果推理过程发生错误，会在调用 `result()` 时抛出异常，请注意进行异常处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.759886Z",
     "start_time": "2024-02-21T08:13:04.127202Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:13:04] openapi_requestor.py:244 [t:13147648000]: requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:244 [t:13181227008]: requesting llm api endpoint: /chat/eb-instant\n",
      "[ERROR] [02-21 16:13:04] openapi_requestor.py:219 [t:13147648000]: api request req_id: as-n30m197kxm failed with error code: 336002, err msg: Invalid JSON, please check https://cloud.baidu.com/doc/WENXINWORKSHOP/s/tlmyncueh\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "数据0发生异常：api return error, req_id: as-n30m197kxm code: 336002, msg: Invalid JSON\n",
      "数据1结果：你好，有什么我可以帮助你的吗？\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:13:04] base.py:89 [t:13181227008]: All tasks finished, exeutor will be shutdown\n"
     ]
    }
   ],
   "source": [
    "err_data=[\"你好\", [{\"role\":\"user\", \"content\":\"你好\"}]]\n",
    "\n",
    "r = qianfan.ChatCompletion().batch_do(err_data)\n",
    "\n",
    "for i, res in enumerate(r):\n",
    "    try:\n",
    "        print(\"数据{}结果：{}\".format(i, res.result()['result']))\n",
    "    except Exception as e:\n",
    "        print(f\"数据{i}发生异常：{e}\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 异步调用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.760476Z",
     "start_time": "2024-02-21T08:13:04.747851Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n"
     ]
    }
   ],
   "source": [
    "results = await qianfan.Completion().abatch_do(data, worker_num=5)\n",
    "# 返回值为一个 List，与输入列表中的元素一一对应\n",
    "# 正常情况下与 `ado` 返回类型一致，但如果发生异常则会是一个 Exception 对象\n",
    "for prompt, result in zip(data[:3], results):\n",
    "    if not isinstance(result, Exception):\n",
    "        print(prompt)\n",
    "        print(result['result'])\n",
    "        print(\"==============\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同样的异步场景下，如果发生了错误需要进行异常处理，此时 results 中某个数据发生错误，那么对应的对象会是 Excepiton 类型，可以通过 `isinstance` 进行识别。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.786618Z",
     "start_time": "2024-02-21T08:13:04.760337Z"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n",
      "[INFO] [02-21 16:13:04] openapi_requestor.py:275 [t:8644182656]: async requesting llm api endpoint: /chat/eb-instant\n"
     ]
    }
   ],
   "source": [
    "err_data=[\"你好\", [{\"role\":\"user\", \"content\":\"你好\"}]]\n",
    "\n",
    "results = await qianfan.ChatCompletion().abatch_do(err_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.787008Z",
     "start_time": "2024-02-21T08:13:04.764429Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "数据0发生错误：no event loop found in current thread, please make sure the event loop is available when the object is initialized\n",
      "数据1发生错误：no event loop found in current thread, please make sure the event loop is available when the object is initialized\n"
     ]
    }
   ],
   "source": [
    "for i, result in enumerate(results):\n",
    "    if not isinstance(result, Exception):\n",
    "        print(\"数据{}结果：{}\".format(i, result['result']))\n",
    "    else:\n",
    "        print(\"数据{}发生错误：{}\".format(i, result))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "ExecuteTime": {
     "end_time": "2024-02-21T08:13:04.802691Z",
     "start_time": "2024-02-21T08:13:04.767480Z"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  },
  "vscode": {
   "interpreter": {
    "hash": "f553a591cb5da27fa30e85168a93942a1a24c8d6748197473adb125e5473a5db"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
